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EDICS Category: SAS-MALN 

Abstract — In this paper we propose a two-level hierarchical 
Bayesian model and an annealing schedule to re-enable the 
noise variance learning capability of the fast marginalized Sparse 
Bayesian Learning Algorithms. The performance such as NMSE 
and F-measure can be improved due to the annealing technique. 
This algorithm tends to produce the most sparse solution under 
moderate SNR scenarios and can outperform most concurrent 
SBL algorithms while pertains small computational load. 

Index Terms — bayesian methods, compressive sensing, sparse 
bayesian learning, fast marginalized, annealing 



I. Introduction 

The Sparse Bayesian Learning(SBL) algorithms (H-O 
recast the solution to compressive sensing j4)-||6l in a prob- 
abilistic way. One of the advantages of SBL over traditional 
convex optimization algorithms |7|, [8| is it's free of choosing 
regularized penalty parameters. The noise variance a 2 along 
with other hyper-parameters can be automatically learned 
during the iterative procedure. Such typical SBL algorithms 
include EMSBL ID-El, (9) and TMSBL EO). In order 
to reduce the computational time of SBL algorithms, fast 
marginalized methods ifTTl . Ifl2ll have been utilized, but those 
algorithms (BCS H3 and FLSBL mi) require the user to 
specify a proper noise variance and are void of automatic 
a 2 learning capability. In this paper we propose a two-level 
hierarchical Bayesian model and a novel annealing technique 
to re-enable the noise learning capability of fast marginalized 
algorithms. The proposed algorithm is fast and outperforms 
most concurrent SBL algorithms in terms of NMSE and the 
number of relevant basis. 

II. Bayesian Hierarchical Model 

The Single Measurement Vector(SMV) form of sparse sig- 
nal reconstruction problem is: 



y = $w + n, 



(1) 



where y g ]R Mxl is the measurement vector, $ is M x N 
measurement matrix with M <C N, w G R 7Vxl is the signal 
to be recovered, and n is an i.i.d. Gaussian with zero mean 
and variance equal to j3 . 

In Bayesian modeling, each unknown quantity is modeled 
as a stochastic variable. The two-level hierarchical Bayesian 
model is constructed as: 



p(y|w,/3) =AA(y|<fw,/r 1 ; 



(2) 
(3) 
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where (f2]l is called observation model, and (0 is signal model. 
Each coefficient Wi is modeled as a Gaussian Process with 
variance equal to jiB, in which B is a scalar and will be 
investigated in detail in section IV. We will use the improper 
[1| hyper-prior for parameters ji, f3 and B. 

The probability of hyper-parameters conditioned on ob- 
served data y is 

p(w, 7 , B, p\y) = p(w|y, 7, B, /?)p( 7) B, /%) (4) 

These parameters can be estimated using a Type II maximum 
likelihood procedure (T): 



using Bayes' rule we have: 

p(w|y,7,5,/3)p(y|7,B,/3) 



arg maxp(y|7,£,/3) 

1,B,P 



(5) 



: p(y|w,/3)p(w|7,B) (6) 



where the right hand side of (O is given in (O and (01, we 
can solve (O using Gaussian Identities: 



p(w|y,7,fl,/3)=JS/'(w|/i,E) 
P (yh,B,(3)=M(y\0,C) 



(7) 
(8) 



By taking the partial derivatives of C = logp(y|7, f3, B) with 
respect to ji, B, (3 and setting them equal to 0, the update 
rules of the those hyper-parameters can be obtained: 
1 



7i 
B 

P = 



i(Tr[SA] 



N 



(9) 
(10) 
(11) 



]y-$^|| 2 + Tr[S$ T $] 

where A = diag(l/7j), /i, is the zth element of fi and Sjj is 
the zth diagonal element of matrix E. The derivation is similar 
to TMSBL [10 1 except that we have explicitly model B as a 
scalar. 

III. Fast Marginalized Implementations 
We rewrite the covariance of p(y\j, P, B) as: 



= C_ 



Bj v 



B^ 



(12) 
(13) 
(14) 



where (pi is the ith column (basis) of $ and denotes 
that the contribution of ith basis is excluded from C. This 
equation has an additional weight B compared with BCS fOll 
and FLSBL |[T4"ll . The update rule for ji is 

i - si 



7i 



B 



(15) 
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where Si and qi is defined as Si = (f>fCzl(f>i, q% = 4>f CZly- 
The process of Add, Delete, and Re-Estimate is identical 
to Tipping iflD . fl2l . For a given basis i, the change of C 
under Add, Delete, and Re-Estimate is denoted as AC^i) = 
£(%) — £(7i), where 7$ is the updated value for 7,. By 
calculating A£(7i),Vi, the one which maximize the change 
in C is selected to be updated to boost the convergence 
speed. The change of log-likelihood is also used to test the 
convergence of the algorithm. 

We should note that each time when f3 or B is altered, 
the whole quantities of fast marginalized algorithms such as 
Si,qi,fi and £ must be re-calculated, we denote it as the process 
of Update(see lfT2l ). 

IV. The Annealing SBL 
The role of B is analyzed by exploring the structure of C: 

C = /3 _1 I + BQA- 1 ^ (16) 

As Wipf and Zhang [ 10 1 pointed out, given $ = 

and I is M X M identity matrix, the above equation could be 

rewritten as: 



C 



/r 1 !- 



B&k'- 1 ®' 7 ' 



(17) 

£diag(7jv_M+i, • • • ,7jv) 

(18) 

With B = 1, a nonzero value of f3 and M nonzero values 
of jn-m+i, ■ ■ ■ )7w make identical contribution to the co- 
variance matrix C, thus f3 and 7 are not identifiable which 
leading to degrade performance [10|. This is especially true 
in BCS 02) and FLSBL OH due to the constructive and 
reconstruction manner of the algorithm. We also observe that 
when B takes a small value, the portion of learning error 
of 7 contributing to the overall C can be minimized and f3 
dominates the covariance matrix. During the iterative learning 
process, the accuracy of 7^ is improved which suggests that the 
restriction on 7 in the covariance matrix C could be released 
by increasing the value of B. 

Inspired by the continuation strategy by Hale lfT31 and 
the simulated annealing methods, it is possible to select an 
arbitrary large noise variance as the initial value for a 2 and 
adopt an increasing sequence of B to obtain the solution of 
w and an estimate of a 2 , which is illustrated in Figure [T] 





select Bi 
reestimate a 
process Update 



select B 2 
reestimate a 
process Update 



select B T4 
reestimate a 2 
process Update 



Fig. 1. The Annealing SBL Scheme. The annealing steps is denoted as Ta- 
At each step i, the annealing update criterion is Ai, a new value of Bi is 
selected and ft is calculated. All the quantities of the algorithm such as /i, S, 
Si and qi should also be Updated. 

In what follows, I will give explicit formulas of some key 
ingredients of the annealing schedule: (1) the annealing update 
criterion and (2) the annealing step size. 



A. The Annealing Update Criteria 

In Ji fT3l and Babacan [14], the program converges when 
the change between consecutive AC is less than the change 
between current and first A£(l), which is 

|A£(fc)-A£(fc-l)| 



< Tj. 



\AC(k) - A£(l) 
Given an annealing step size Ta, the annealing criterion is 



(19) 



n TA 



n 



\AL(k A ) - AC(k A - 1) 



< 



(20) 
(21) 



\AC(k A ) - AC(1 A 

In the above equation, k A is the iterative number at an 
annealing temperature A. We can see that each annealing step 
is made to decrease the change of log-likelihood to a fraction 
of A£(l), the overall exit criterion is made to be the same 
with the BCS [13] and FLSBL El algorithm. We also find 
that (f20b is too slow for Bi taking 1. We thus modify (120b 
with a scaling factor: 



aT A 
VA 



(22) 



and a — 2 is used in our experiments. 



B. The Annealing Steps Ta 

The increasing sequence of B is uniformly divided in the 
interval [0.1,1] with Ta steps. This parameter is analyzed 
in detail in the next section. Each time when the annealing 
criterion is met, we select the next Bi in the sequence B. 
The a 2 can be automatically updated in the Fast Marginalized 
algorithm and the initial value of a 2 can be selected arbitrary. 
For convenience we simply let a 2 = ||y|| 2 . 

C. The ASBL Algorithm 

The Annealing SBL algorithm is given in Fig. |2] The 



procedure ASBL($, y, 77, of , Ta) 

while Global Convergence is not met do 
if Annealing Criterion is met then 
select Bi from B. 
update (3 using eqdTTTi. 
process Update, 
end if 

process BCS routine, 
end while 
end procedure 



Fig. 2. The ASBL Algorithm 

proposed method is called ASBL in the remaining of this 
paper. 

V. Experiments 

A signal w of length N is generated with T nonzero weights 
random located, the amplitude of each nonzero weights is 
sampled from uniform ±1 random spikes. The measurement 
matrix $ e M. is a uniform spherical ensemble, with each 
columns <fii uniformly distributed on the sphere M. N . n is an 
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i.i.d. Gaussian variable with variance cr\, the standard devia- 
tion of noiseless measurements y s = $w is a y . The signal- 
to-noise ratio (SNR) is defined as SNR = 20 log 10 a y /cr n . 

We compare the proposed method with the algorithms 
EMSBL Q], ®, TMSBL QO), BCS fl3] and FLSBL 
|14|. In moderate SNR scenarios, the F-measure of Support 
Recovery (F-index) [10] was used as a performance index, 
defined by F = |8 c |/|8 t | and 6 C = 9 e n 8 t , where 
6t was the locations of true signal w and e was the 
maximum T locations of the estimated signal w. We also 
calculate the normalized Mean Square Error(NMSE), defined 
by |w — w|||/||w|||, as well as the CPU time and the number 
of relevant basis, which is denoted as Nb- 

For simplicity, in the experiments thereafter we fix M = 
100, N = 512 and vary T and SNR to test these algorithms 
under different sparsity and noise levels. A similar phase 
transition is used to illustrate how the sparsity level (defined by 
p = T/M) and noise level affect the success of the algorithm, 
where a success is defined when the average of F-index exceed 
0.9. A point above the phase plot indicates a failure while 
below the curve the success is 1. We vary the SNR from 5dB 
to 25dB with 5dB step size and run each experiment for 100 
iterations. 

A. The choice of Ta 

The phase transitions of ASBL with different Ta values 
is plotted in Figure [3] The values Ta — 8, 10 have similar 
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Fig. 3. The Choice of Ta- The phase transition with different sparsity levels 
and SNR is plotted. We set the annealing steps of ASBL to T A = 4, 6, 8, 10 
and the convergence criterion 77 = 10~ 4 . 

performance, while Ta = 8 takes small annealing steps and 
computational load. We will choose Ta = 8 as the default 
parameter of ASBL in the remaining experiments. 

B. The Phase Transition 

ASBL is inherently BCS with additional B annealing sched- 
ules and a 2 learning capabilities. In this experiment we will 
plot the phase transition with respect to different sparsity and 
SNR levels for ASBL, BCS, FLSBL, EMSBL and TMSBL. 
The true noise variance is selected as the initial of value for 
BCS and FLSBL. For EMSBL and TMSBL, the automatic a 2 
learning capability is toggled on. The result is shown in Figure 
|U The performance of ASBL, BCS and FLSBL is inferior to 
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SNR(dB) 



Fig. 4. The phase transitions of ASBL, BCS, FLSBL, EMSBL and TMSBL. 
The phase transition is plotted with respect to different sparsity and SNR 
levels. Each point on the phase curve corresponds to the average of F-index 
larger than or equal 0.9. 

EMSBL and TMSBL when SNR > lOdB and sparsity level 
p > 0.2, this is an open issue largely due to the constructive 
and reconstruction nature of the fast marginalized method, 
which will be explored in our next paper. The ASBL algorithm 
has better performance than BCS and FLSBL, this property 
is attained without even a prior knowledge of the true noise 
variance. The advantage of introducing additional annealing 
steps will be analyzed in detail in the next experiment. 

C. The performance comparison of different SBL algorithms 
on ID data 

In this experiment we fix M = 100, T = 10, SNR = lOdB 
and compare different SBL algorithms in term of NMSE, CPU 
time and the number of relevant basis Nb- The simulation 
results is plotted in Figure [5] It is interesting that the ASBL 
algorithm seems to attain the lower bound in terms of NMSE 
among those algorithms when SNR < 15dB. The average time 
of TMSBL and EMSBL is 3s and 5s respectively, while ASBL 
takes only a little longer than BCS and FLSBL. The number 
of relevant basis Nb of ASBL is the smallest among all the 
algorithms, which means that the proposed method produces 
the most sparse solution under moderate SNR scenarios. This 
is very impressive given its superior performance in NMSE 
and CPU time. 

D. The a 2 learning capabilities of SBL algorithms 

The estimated of a 2 of ASBL, as well as true noise variance 
and the estimated a 2 of different SBL algorithms is plotted in 
Figure We can see that the BCS, FLSBL, EMSBL tend to 
under-estimate the noise variance during the learning process, 
while TMSBL with the advanced a 2 learning option toggled 
on tend to over-estimate the noise level. The BCS and FLSBL 
have the same slope as ASBL. Among those algorithms, the 
ASBL showed the best a 2 learning performance, which means 
that with an arbitrary large initial noise variance (a 2 = \\yW2) 
as the starting point, the proposed annealing method gradually 
seeks the balance between data misfit and regularized penalty, 
produces an accurate estimation of a 2 when the annealing 
procedure stops. 
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Fig. 5. NMSE, CPU time and N B versus SNR with different SBL algorithms. 
The automatic a 2 learning option is toggled on for EMSBL and TMSBL. The 
initial value of a 2 for BCS and FLSBL is set to the true noise variance. The 
convergence criterion are r\ = 10~ 4 for ASBL, BCS and FLSBL, r\ = 10 — 8 
for EMSBL and TMSBL. 



VI. Conclusion 

In this paper we propose an annealing SBL (ASBL) algo- 
rithm and an implementation using Fast Marginalized method. 
The ASBL algorithm is free of user tuned parameters and 
can automatically update the noise variance a 2 to lock to the 
optimum performance during the learning process. The ASBL 
tends to produce the most sparse solution under moderate 
SNR (SNR < 15dB) and its performance is superior to 
TMSBL, EMSBL, BCS and FLSBL in terms of NMSE. 
These properties are very attractive for signal reconstruction 
in noisy measurements. The proposed method is based on fast 
marginalized implementation and its CPU time is far more less 
than EMSBL and TMSBL which will win ASBL a broad area 




Fig. 6. The estimated noise variance versus SNR with different SBL 
algorithms. The EMSBL and TMSBL have automatic o 2 learning capability. 
For BCS and FLSBL, the a is updated when the convergence criterion has 
met. 



of applications. 
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